Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing
نویسندگان
چکیده
Crowdsourcing is an effective tool for scalable data annotation in both research and enterprise contexts. Due to crowdsourcing’s open participation model, quality assurance is critical to the success of any project. Present methods rely on EM-style post-processing or manual annotation of large gold standard sets. In this paper we present an automated quality assurance process that is inexpensive and scalable. Our novel process relies on programmatic gold creation to provide targeted training feedback to workers and to prevent common scamming scenarios. We find that it decreases the amount of manual work required to manage crowdsourced labor while improving the overall quality of the results.
منابع مشابه
Worker Perception of Quality Assurance Mechanisms in Crowdsourcing and Human Computation Markets
Many human computation systems utilize crowdsourcing marketplaces to recruit workers. Because of the open nature of these marketplaces, requesters need to use appropriate quality assurance mechanisms to guarantee high quality results. Previous research has mostly focused on the statistical aspects of quality assurance. Instead, we analyze the worker perception of five quality assurance mechanis...
متن کاملEffective Quality Assurance for Data Labels through Crowdsourcing and Domain Expert Collaboration
Researchers and scientists have been using crowdsourcing platforms to collect labeled training data in recent years. The process is cost-effective and scalable, but research has shown that the quality of truth inference is unstable due to worker bias, work variance, and task difficulty. In this demonstration, we present a hybrid system, named IDLE (Integrated Data Labeling Engine), that brings ...
متن کاملMentor: A Visualization and Quality Assurance Framework for Crowd-Sourced Data Generation
Crowdsourcing is a feasible method for collecting labeled datasets for training and evaluating machine learning models. Compared to the expensive process of generating labeled datasets using dedicated trained judges, the low cost of data generation in crowdsourcing environments enables researchers and practitioners to collect significantly larger amounts of data for the same cost. However, crow...
متن کاملBehavior-Based Quality Assurance in Crowdsourcing Markets
Quality assurance in crowdsourcing markets has appeared to be an acute problem over the last years. We propose a quality control method inspired by Statistical Process Control (SPC), commonly used to control output quality in production processes and characterized by relying on time-series data. Behavioral traces of users may play a key role in evaluating the performance of work done on crowdso...
متن کاملOntology Quality Assurance with the Crowd
The Semantic Web has the potential to change the Web as we know it. However, the community faces a significant challenge in managing, aggregating, and curating the massive amount of data and knowledge. Human computation is only beginning to serve an essential role in the curation of these Web-based data. Ontologies, which facilitate data integration and search, serve as a central component of t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011